NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

Thompson, Kyle; Saanvedra, Nuno; Carrott, Pedro; Fisher, Kevin; Sanchez-Stern, Alex; Brun, Yuriy; Ferreira, Joao F; Lerner, Sorin; First, Emily (April 2025, Proceedings of the IEEE/ACM International Conference on Software Engineering)

Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the- art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.
more » « less
Full Text Available
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

An, Chenyang; Chen, Zhibo; Ye, Qihao; First, Emily; Peng, Letian; Zhang, Jiayun; Wang, Zihan; Lerner, Sorin; Shang, Jingbo (August 2024, The 62nd Annual Meeting of the Association for Computational Linguistics)

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TRIALMASTER) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
more » « less
Full Text Available
Baldur: Whole-Proof Generation and Repair with Large Language Models

https://doi.org/10.1145/3611643.3616243

First, Emily; Rabe, Markus; Ringer, Talia; Brun, Yuriy (November 2023, ACM)
Reappraising Crystallization Kinetics with Overgrowth Chronometry: an in Situ Study of Olivine Growth Velocities

https://doi.org/10.1093/petrology/egad055

Welsch, Benoît; Faure, François; First, Emily C (August 2023, Journal of Petrology)

Abstract We investigated the early stages of olivine crystal growth via in situ seeded experiments in a single plagioclase-hosted melt inclusion, using a heating stage microscope. Each experiment was subjected to a cooling ramp of 7800°C/h followed by an isothermal dwell at 19°C, 38°C, 57°C, 77°C, 96°C or 129°C of undercooling. The seeds (6–16 μm in diameter Ø) grew into large crystals (Ø 80–169 μm) in 3 to 30 min through the symmetrical development of tabular, skeletal, and dendritic overgrowths as the undercooling of the system increased. Time-resolved image processing and incremental measurements of the overgrowth thicknesses indicate up to three stages of crystal growth: an acceleration stage, a linear (constant growth rate) stage, and a deceleration stage. At the isotherm, the growth velocities reach a stable maximum that in all experiments corresponds to the period of linear growth. The highest linear values are measured at the {101} interfaces, from 2.1 × 10−8 m/s at 19°C of undercooling to 4.8 × 10−7 m/s at 129°C of undercooling. Crystal growth is slower at other interfaces, in the ranges 1.9–7.6 × 10−8 m/s and 4.5 × 10−9 – 7.6 × 10−8 m/s for the {100} and {001} forms, respectively. Growth in the <010> dimension appears limited to less than 2.4 × 10−8 m/s at 129°C of undercooling. We constrain the uncertainty on these growth velocities, which includes the environmental conditions (± 8.6°C on the nominal undercooling) and the measurements of crystal lengths (underestimated by <16% at most fast interfaces). A systematic and comprehensive review of 19 pre-existing datasets indicates that our linear growth velocities are faster than most growth rates determined at comparable undercoolings. Growth rates determined as half crystal lengths divided by total time are intrinsically low estimates of the true maximum, linear growth velocities, because the total time includes periods of slower or non-growth, and measured crystal dimensions are subject to projection foreshortening or truncation. These errors can lead to values that are several times to several orders of magnitude lower than the true maximum growth rates. This study completes and refines previously published data on the crystallization kinetics of olivine, highlighting the sensitivity of growth rates to specific environmental conditions and measurement methods. We emphasize the importance of symmetrical growth and true maximum growth velocities for interpreting olivine growth histories.
more » « less
Full Text Available
Galaxy of Green

https://doi.org/10.2138/gselements.19.3.173

First, Emily C; Kremer, Christopher; Telus, Myriam; Trang, David (June 2023, Elements)

Olivine occurs across the galaxy, from Earth to extraterrestrial bodies including the Moon, Mars, and asteroids, to particles of comet dust and distant debris disks. The mineral is critical to our understanding of early Solar System chronology, planetary formation processes (e.g., magma ocean solidification), crustal evolution (e.g., volcanic eruptions), and surface weathering. Olivine’s ability to shed light on these processes lies in the linkage of small, physical samples and satellite-derived data. Laboratory spectra become the basis for olivine detection and compositional interpretation in remotely sensed spectra ranging from high-resolution planetary maps to single extra-solar datapoints. In turn, petrologic studies of olivine underpin the geologic interpretations of these spectral datasets. Finally, olivine chemistry records Solar System formation conditions and relative chronology. Olivine is our bridge across time and space.
more » « less
Full Text Available
Olivine—The Little Green Science Machine

https://doi.org/10.2138/gselements.19.3.138

Welsch, Benoît; First, Emily C; Ruprecht, Philipp; Jollands, Michael C (June 2023, Elements)

In some ways, olivine has driven the evolution of the Solar System and likely beyond. As one of the earliest-crystallizing silicate minerals, olivine controls the initial chemical evolution of planet-wide magma oceans and individual lava flows alike. In solid aggregate form, it controls and records deformation of the mantle and smaller-scale intrusive complexes. The components of its crystal structure are mobile at high temperatures and their migration can be used to explore the timing of magmatic events. During chemical weathering, these olivine crystals capture carbon dioxide from the atmosphere as secondary minerals are formed. All of these processes take place not only on Earth, but also on other planetary bodies, making olivine ideally suited to shed light on both primordial planet-building processes and current-day volcanism and surface processes.
more » « less
Full Text Available
Passport: Improving Automated Formal Verification Using Identifiers

https://doi.org/10.1145/3593374

Sanchez-Stern, Alex; First, Emily; Zhou, Timothy; Kaufman, Zhanna; Brun, Yuriy; Ringer, Talia (June 2023, ACM Transactions on Programming Languages and Systems)

Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification by learning from proof corpora to synthesize proofs have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the powerful logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than on how to make the most of the proof data. This article systematically explores how to most effectively exploit one aspect of that proof data: identifiers. We develop the Passport approach, a method for enriching the predictive Coq model used by an existing proof-synthesis tool with three new encoding mechanisms for identifiers: category vocabulary indexing, subword sequence modeling, and path elaboration. We evaluate our approach’s enrichment effect on three existing base tools: ASTactic, Tac, and Tok. In head-to-head comparisons, Passport automatically proves 29% more theorems than the best-performing of these base tools. Combining the three tools enhanced by the Passport approach automatically proves 38% more theorems than combining the three base tools. Finally, together, these base tools and their enhanced versions prove 45% more theorems than the combined base tools. Overall, our findings suggest that modeling identifiers can play a significant role in improving proof synthesis, leading to higher-quality software.
more » « less
Full Text Available
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

https://doi.org/10.1109/ICSE55347.2025.00161

Thompson, Kyle; Saavedra, Nuno; Carrott, Pedro; Fisher, Kevin; Sanchez-Stern, Alex; Brun, Yuriy; Ferreira, João F; Lerner, Sorin; First, Emily (April 2025, IEEE)

Full Text Available
Diversity-Driven Automated Formal Verification

https://doi.org/10.1145/3510003.3510138

First, Emily; Brun, Yuriy (May 2022, Proceedings of the 44th International Conference on Software Engineering (ICSE))

Formally verified correctness is one of the most desirable properties of software systems. But despite great progress made via interactive theorem provers, such as Coq, writing proof scripts for verification remains one of the most effort-intensive (and often prohibitively difficult) software development activities. Recent work has created tools that automatically synthesize proofs or proof scripts. For example, CoqHammer can prove 26.6% of theorems completely automatically by reasoning using precomputed facts, while TacTok and ASTactic, which use machine learning to model proof scripts and then perform biased search through the proof-script space, can prove 12.9% and 12.3% of the theorems, respectively. Further, these three tools are highly complementary; together, they can prove 30.4% of the theorems fully automatically. Our key insight is that control over the learning process can produce a diverse set of models, and that, due to the unique nature of proof synthesis (the existence of the theorem prover, an oracle that infallibly judges a proof's correctness), this diversity can significantly improve these tools' proving power. Accordingly, we develop Diva, which uses a diverse set of models with TacTok's and ASTactic's search mechanism to prove 21.7% of the theorems. That is, Diva proves 68% more theorems than TacTok and 77% more than ASTactic. Complementary to CoqHammer, Diva proves 781 theorems (27% added value) that Coq-Hammer does not, and 364 theorems no existing tool has proved automatically. Together with CoqHammer, Diva proves 33.8% of the theorems, the largest fraction to date. We explore nine dimensions for learning diverse models, and identify which dimensions lead to the most useful diversity. Further, we develop an optimization to speed up Diva's execution by 40X. Our study introduces a completely new idea for using diversity in machine learning to improve the power of state-of-the-art proof-script synthesis techniques, and empirically demonstrates that the improvement is significant on a dataset of 68K theorems from 122 open-source software projects.
more » « less
Full Text Available
PRoofster: Automated Formal Verification

https://doi.org/10.1109/ICSE-Companion58688.2023.00018

Agrawal, Arpan; First, Emily; Kaufman, Zhanna; Reichel, Tom; Zhang, Shizhuo; Zhou, Timothy; Sanchez-Stern, Alex; Ringer, Talia; Brun, Yuriy (May 2023, Proceedings of the Demonstrations Track at the 45th International Conference on Software Engineering (ICSE))

Full Text Available

« Prev Next »

Search for: All records